Return to doc.sitecore.com

Languages and Search Engines
Prev Next

Author: Jesper Jørgensen
Posted: 1/17/2006 10:00:00 AM

Designing websites in multiple languages requires special attention if you want to make the most of search engines like Google that crawl your website from an external url. Knowledge about how webcrawlers work will improve the chance that you get it right in the first shot.

Unless you have special knowledge about specific web crawlers and only want to optimize your site for this engine, you should assume the following as facts:

  1. Webcrawlers do not perform FORM posts.
  2. Webcrawlers do not execute javascript.
  3. Webcrawlers do not keep session state.
  4. Webcrawlers only keep one index of a particular URL.
  5. Info: Webcrawlers navigate only tags of the <A href=""/> style

Let’s say your website www.yoursite.com shows up by default in American English, but you have a dropdown box where you can switch to French and German. Imagine you navigated to www.yoursite.com/news.html. After selecting for example German, you are still on the same page, but because your site logic saved German in a session variable, your page now is shown in German.

Now imagine what will happen when you submit www.yoursite.com to Google, and ask it to search your website. Your website will now only be indexed in American English for the following reasons:

  1. The crawler is not able to look beyond the formpost (or javascript) needed to submit the value from your dropdown box.
  2. Even if it got past the dropdown, it would not be able to keep the session and would immediately forget the setting for German, and would see all other pages in American.

Now assume that instead of the dropdown box, you just add images with country flags to the page, and have them link to for example www.yoursite.com/news.html?language=fr*. You would probably still have to save the setting in a session variable to see the rest of the site in French. How would it look to the webcrawler?:

*) This is just an example; Sitecore standard syntax for setting the language may be different.

So what do you do? Well, a little trick that I didn't really think of earlier, though it may be worth a try, is as follows:

The neat solution: That would actually be to have domain name for each language. For example: www.yoursite.fr for the French language. Now your site should just show up in the language for the corresponding domain, without having to use a session variable. A fine example of this is www.nilfisk-advance.dk that shows up in Danish and www.nilfisk-advance.pt that shows up in Portuguese. If you search Google for "Na Nilfisk-Advance ocorrem coisas", www.nilfisk-advance.pt will show up as the first (and only) link. And when you have been reading this article, that may just be what you want to happen with your multiple languages. The reasons this works so nicely are:

  1. Google just need an initial link for the root of the site (www.nilfisk-advance.pt) to crawl the site in this language.
  2. No session is needed to remember the language.
  3. Each page has a unique URL in the different languages because the domain name is unique.

Therefore www.nilfisk-advance.pt/Info/News.html and www.nilfisk-advance.dk/Info/News.html are seen as different pages by Google, and indexed individually.


Prev Next